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discrete the distribution of the observed item responses is a finite 
mixture, and the EM algorithm for finite mixtures can be used. 

Maximum likelihood estimates of the item parameters and of the 
discrete probabilities of the latent ability distribution are given 
using the EM algorithm for finite mixtures. Results are presented in 
general for both dichotomous and polytomous item response models. The 
relation between the EM estimates and the Bock Aitken marginal 
maximum likelihood estimates is discussed. Estimates for the item 
parameters will depend on the specific form of the item response 
functions, and will usually require iterative numerical procedures. 
The EM algorithm is the same as the Bock-Aitken algorithm (R. D. Bock 
and M. Aitken, 1981) for marginal maximum likelihood estimation of 
the item parameters. (Contains 28 references.) (Author/SLD) 



*********************************************************************** 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. * 

******************* * **************************************** *********** 



ERIC 



o 




.Research Report Series ; 96-6 



Estimation of Item Response Models 
Using the EM Algorithm for Finite 
Mixtures 



David J. Woodruff ' 
Bradley A. Hanson 



U.S. DEPARTMENT OF EDUCATION 

f of Educational Research and Improvement 
TIONAL RESOURCES INFORMATION 
CENTER {ERIC) 

document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 

• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY 

r-A- EfiMmyr. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)' 















For additional copies write: 

ACT Research Report Series 
P.O. Box 168. 

Iowa City, Iowa 52243-0168 



© 1996 by ACT, Inc. All rights reserved. 

ERJC 3ESTC0py available 



Estimation for Item Response Models using the EM 
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David Woodruff 
Bradley A. Hanson 



Abstract 



This paper presents a detailed description of maximum likelihood parameter estima- 
tion for item response models using the general EM algorithm. In this paper the models 
axe specified using a univariate discrete latent ability variable. When the latent ability 
variable is discrete the distribution of the observed item responses is a finite mixture, and 
the EM algorithm for finite mixtures can be used. Maximum likelihood estimates of the 
item parameters and of the discrete probabilities of the latent ability distribution are given 
using the EM algorithm for finite mixtures. Results are presented in general for both di- 
chotomous and polytomous item response models. The relation between the EM estimates 
and Bock-Aitken marginal maximum likelihood estimates is discussed. 



Estimation for Item Response Models using the 
EM Algorithm for Finite Mixtures 

The purpose of this paper is to present a fairly simple and unified treatment of how 
the general EM algorithm can be used to obtain maximum likelihood estimates (MLEs) 
of both the item parameters and the probability distribution of the latent ability variable 
for item response models. The approach taken in this paper is to assume the latent ability 
variable being measured by the items is discrete. When the latent ability variable is discrete 
the distribution of the observed data is a finite mixture (Titterington, Smith, and Makov, 
1985). With a discrete latent ability variable the EM algorithm for finding maximum 
likelihood estimates for finite mixtures can be used (Dempster, Laird, and Rubin, 1977; 
Titterington, Smith, and Makov, 1985). 

This paper clarifies previously established results using a finite mixture approach. A 
complete, self-contained description of maximum likelihood parameter estimates of item 
response models for dichotomous and polytomous items using the EM algorithm for finite 
mixtures is presented. The use of the finite mixture model allows a variety of previously 
disparate results to be consolidated using a single relatively simple approach that allows a 
straight-forward presentation with pedagogic value. 

Versions of the results in this paper have been presented by others for a variety of 
specific item response models. Ma ximum likelihood estimates of item parameters using the 
EM algorithm have been presented for a variety of item response models for dichotomous 
items (Bock and Aitken, 1981; Thissen, 1982; Rigdon and Tsutakawa, 1983; Tsutakawa, 
1984; Bartholomew, 1987; Harwell, Baker, and Zwarts, 1988; Baker, 1992) and polytomous 
items (Thissen and Steinberg, 1984; Bartholomew, 1987; Muraki, 1992; Wilson and Adams, 
1993). The EM algorithm for finite mixtures has been applied in estimating parameters 
for the Rasch model by De Leeuw & Verhelst (1986) and Follmann (1988). The maximum 
likelihood estimates of the probabilities of the discrete latent ability distribution presented 
here were given by Bock and Aitken (1981), Mislevy (1984), and Titterington, Smith, and 
Makov (1985). 

The data to be modeled are the responses of i = 1, . . . , N examinees, randomly sam- 
pled from a population of ex amin ees, to a fixed non-random set of j = 1 ,n items. The 



responses of the N examinees to the n items are contained in a n x N matrix Y made up 
of n x 1 col umn vectors yi, • • • ,y*, • • • ,Yn that contain the responses of the ith randomly 
sampled examinee to the n fixed items. The matrix Y is given by 

Y = [yi, • • • ,y», • • • ,Yn] • (1) 

The jfth element of y i (the response of the zth randomly sampled examinee to item j) is 
denoted y^. It is assumed that the set of responses to each item is finite. If the responses 
are dichotomous then the possible values of y ij are 0 and 1. If the responses are polytomous 
then the possible values of yij are taken to be the integers 0, 1 , . . . , Lj — 1 (item j has Lj 
response categories). In practical applications values of the polytomous items need not be 
integers or even ordered. Note that different items may have different numbers of response 
categories. 

Associated with item j is a set of Vj item parameters denoted by the Vj x 1 column 
vector, 6j. The parameters for all n items are represented by A, the collection of all 6j 
column vectors, that is A = [6j, . . . ,6j, . . . , 6 n ]. When the number of item response cate- 
gories is the same for every item (e.g., dichotomous items) then the number of parameters 
will typically be the same for every item so that Vj — v for all j. 

In addition to the observed item responses, there is a realization of a latent ability 
random variable © for each randomly sampled examinee. Unlike the realization of the 
item responses, the realization of 0 for the ith randomly sampled examinee (denoted 9i ) 
is not observed. The value 6i is sometimes referred to as the “ability” of the ith rando ml y 
sampled examinee. In this paper the term “latent variable” will typically be used in place 
of “ability.” 

The latent random variable 0 is usually considered to be continuous. In this paper 
the latent variable is taken to be discrete, and estimation procedures are derived based on 
the discrete latent variable. This is opposed to deriving estimation procedures based on a 
continuous latent variable and then implementing approximations of those procedures with 
a discrete version of the continuous latent variable (e.g., Bock and Aitken, 1981; Muraki, 
1992). In this paper the approximation of the continuous latent variable with a discrete 
latent variable is done in the model specification. This allows straightforward application 
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of the EM algorithm for finite mixtures. 

The latent random variable © can take on m known discrete values 8 k ,k = 1, . . . , m, 
with associated unknown probabilities n k,k = (a short discussion of choosing 

the value of m is given in the Discussion section at the end of the paper). The values 
of 9k axe chosen at the initiation of the estimation process and determine the scale of 
the latent variable. Typically the scale of the latent variable can only be known up to a 
linear transformation, so the model is invariant to a linear transformation of the 8k (along 
with an associated transformation of the item parameters). The m x 1 column vector of 
latent probabilities is given by 7T = (fli, . . . ,7r m )*. The random variable 0 has a probability 
distribution defined over the population of examinees [Pr(© = 8k \ tt)] that can be denoted 
variously as 



Pr(© = 8 k | 7r) = Pr(© = 8 k \ Tr k ) 

= p(8 k | 7r) = p(&k | TTfc) = TTfc . 



(2) 



In this paper the latent random variable © is taken to be univariate. It is possible to 
generalize the formulas presented to the case of a multivariate 0. A multivariate © would 
greatly increase the computational effort required to compute estimates. 

The EM Algorithm for Finite Mixtures 

Let /( y | A, 7r) be the probability distribution for the observed item responses (y is 
a vector of realizations of the item response random variables). When the latent variable 
is discrete /( y | A,7r) is given by 



/( y I A,tt) = y,9 k | A, 7r) 

k — 1 

rrt 

= '52f(y\ 0k, A, 7T )p(9 k I A, 7 r) 

k — 1 
m 

= ^2f(y 1 0k,A)p(8 k | *■) 



k — 1 
m 



= 5^/(y I &k, A)7T fc , 
k — 1 



( 3 ) 
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where f{y,0 | A,7t) is the joint probability distribution of the item responses and the 
latent variable, and /( y | 6k, A) is the conditional probability distribution of the item 
responses for examinees with a fixed value of the latent variable Ok- The third and fourth 
fines of Equation 3 are obtained by using the equalities 

/(y |0fc,A,-7r) = /(y |0fc,A), (4) 

and 

p(0k I A,7r) = p(0k I 7r) = 7 Tfc . (5) 

Equation 4 follows from the assumption made in item response models that conditioned 
on the latent variable 0 the probability of an examinee’s responses to n ite ms does not 
depend on the probability distribution of the latent variable in the population of examinees. 
Equation 5 follows from the fact that the probability distribution of 0 in the population 
of examinees does not depend on the item parameters for the n items. 

The expression for /(y | A, 7r) in Equation 3 is a finite mixture (Titterington, Smith, 
and Makov, 1985). That is, from the last fine in Equation 3 it can be seen that /( y | A, 7 r) 
is a sum of component densities /( y | Ok, A) with associated mixin g weights irk- 

The EM algorithm for finding maximum likelihood estimates of the parameters of a 
finite mixture is described by Dempster, Laird, and Rubin (1977, section 4.3), and Tit- 
terington, Smith, and Makov (1985, section 4.3.2). The presentation of the EM algorit hm 
for finite mixtures in this paper uses somewhat different notation than that used in those 
presentations. Dempster, Laird, and Rubin (1977) and Titterington, Smith, and Makov 
(1985) use an indicator vector z * in place of 0 it where z, is of length m with a one in the 
position indicating the category of the latent variable for examinee i and zeros elsewhere. 
The present notation is more consistent with notation used in the psychometric literature. 

The observed data are (yi, . . . , yyv), the missing data are (Oi , . . . , 0 N ), and the com- 
plete data is [(yi,#i), . . . , (yjv> 07 v)]- The complete data likelihood for the sample is 

N 

n/(y^iA>")> 

i= 1 
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where /(yi,#i | A,7 t) is the complete data likelihood for examinee i. The observed data 
likelihood for the sample is 

N N m 

n/^A^nz/^AiA,.), ( 7 ) 

i = 1 i=lfc=l 

where /(yi | A, 7r) is the observed data likelihood for examinee i. The EM algorithm uses 
the complete data likelihood to find values of the parameters A and 7r which maximize 
the observed data likelihood (Dempster, Laird, and Rubin, 1977). 

The general EM algorithm generates a sequence of estimates (A^ s \ 7r^ s )), s = 1, 2, . . ., 
given starting values (A^ 0 ), 7 r(°)). There axe two steps in each iteration: the E step and 
the M step. In the E step the conditional expectation of the complete data log-likelihood is 
taken, where the conditional expectation is with respect to the conditional distribution of 
the missing data given the observed data and some fixed known values of the parameters. 
Let ©i be the random variable representing the latent variable for examinee i (@i,i = 
1 , ...,1V are independent and identically distributed), and let © = (©i, . . . , ©n)- The 
conditional expectation evaluated at the E step for iteration s, s = 0, 1, . . ., is 



Q[(A,,r)|(A< s U< s >)] = £e{log 



N 



UfM i A > 7r ) 



Li=l 



Y,A (S) ,7T (S) | , 



(8) 



where the expected value is over the conditional distribution of the missing data given 
the observed data and fixed known values of the parameters (A^,7 t^). Equation 8 is 
the expression used in the E step of the general EM algorithm, as it is the expectation 
of the complete data log likelihood. Complete data sufficient statistics (other than the 
observations themselves) are not used. 

The M step finds values of A and 7r that maximize the conditional expectation of 
the complete data log-likelihood. The M step at iteration s, s = 0, 1,..., finds (A, 7 r) = 
(A( s+1 ), •7r( s+1 )) to ma ximiz e Q[(A,7r) | (A^jTT^)]. The new estimates (A^ s+1 \ tt( s+1 )) 
produced in the M step at iteration s are used in the E step at iteration s + 1. Iterations 
continue until convergence is obtained. 

The expectation in Equation 8 can be written as 
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E@{ log 



N 



i A,7r ) 



Li=l 



Y, A (s) ,tt ( s >J 



r N 

= Eg\ ^ 2^og[f(yi,di | A,tt)] | Y, A 

M= 1 



(5) W w 



N 

= ^£ ei {log[/(yi,^ | A, 7r)] | y<, A (s) ,tt ( s) } 

t=l 
N m 

= 5Z{log[/(y», | A,tt)]p(^ | y<, A (s) ,tt ( s) )} 

i= 1 fc=l 
m TV 

= 5^5^{log[/(y<»^ I A,tt)]p(^ | y<, A (s) ,tt ( s) )}, (9) 

k=l i = 1 

where p(9k | y^, A^ s \ is the conditional probability that ©i = 9k given fixed known 
values yi, A^ s \ and 7r^ s ^. Note that log[/(yi, 0; | A,7 t)] is simply treated as a function of 
the discrete random variable ©i with respect to which its expectation is being taken. That 
is why in the first three lines of Equation 9 the realization of the latent random variable 
©i for the ith randomly sampled examinee is denoted 6i, whereas in the last two lines, 
where the conditional expectation has been made explicit, the 6i (the unknown realization 
of the latent variable for examinee i) are changed to 9k (the known values of the latent 
variable that could be realized for an examinee) in accordance with the expectation over 
the discrete distribution of the latent random variable. It should be noted that A and tt 
are free unknown quantities for which estimates are found in the M step, whereas A^ 
and 7 are fixed known quantities that have been computed at step s — 1. 

Estimation of A and n can be simplified by separating Equation 9 into two additive 
terms with the first depending only on A and the second depending only on 7 r. In this 
way the derivative of the first term can be taken with respect to A, the result set to zero 
and solved for A. Similarly, the derivative of the second term can be taken with respect 
to 7T, the result set to zero and solved for 7 r. Consequently, M step est imat es can be 
calculated separately for A and 7r. The estimates of 7 r are easy to compute (a closed- 
form solution exists). The computation of estimates of A will typically require iterative 
numerical methods. 
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To separate Equation 9 into one part depending only on A and one part depending 
only on 7r note that using Equations 4 and 5 (substituting y* for y) 

/(y*A I a,tt) = f(yi | 0 *, a,7 v)p(e k \ a,-*) 

= f(yi\Qk,A)n k - (10) 

Substituting Equation 10 into Equation 9 gives 

m N 

£I>°g[/(y<A I I Yi. A<*>, »<*>)} 

k = 1 i= 1 

m AT 

= S^{log[/(Yi | e k , A.)T k ]p(0 t I y ( , A<*>, *<*>)}. (11) 

k= 1 i=l 

The right side of Equation 11 can be written as 

771 N 

££fl°g[/(y* I o k ,A)] P (e k |y ( ,A<*>, »<*>)} 

k=l i=l 

m JV 

+ X] | yi, A (s) , - tt ( s) )} . (12) 

k= 1 i=l 

The first term in Equation 12 involves only A, and the second term in Equation 12 involves 
only 7 r (all other terms, including 8 k , are constants). The first term in Equation 12 will 
be denoted by 

771 N 

<t>( A) = ^2^2{^og[f(yi | 8 k ,A)]p(8 k | y^ A (s) , tt ( s) )} , (13) 

/c=l i— 1 

and the second term in Equation 12 will be denoted 

m AT 

V»(w) =^^{log(7T fc )p(0fc I yi, A (s) ,7t ( s) )}. (14) 

fc=l i= 1 

In the E step at iteration s,s = 0,l,..., the mN (m categories of the latent variable 
for each of N examinees) conditional probabilities p(8 k | y A^ tt^) are computed using 
the values of A^ and computed in the M step at iteration s — 1, or in the case of 
s = 0 the starting values. Note that for the first iteration (s=0) of the EM algorithm, 



0 
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the values of p{9 k | yi, 7 r^ 0 ^) can be specified directly instead of specifying a A^ 0 ) 
and 7r(°) and computing p(9 k | y i, The values of p(9 k | y i, A^, 71- ^) are then 

substituted into Equations 13 and 14 at iteration s, and the M step at iteration s consists 
of two parts: (1) finding values of A that maximizes 0(A) [these will be A^ s+1 ^], and (2) 
finding values of 7r that maximize tp( 7r) [these will be 7r( s+1 )]. 

Using the definition of conditional probability, the probabilities p(9 k \ yi, A^ s \ 7 r^ s )) 
that are computed in the E step can be expressed as 



p(9 k | yi, A (s) ,7t ( s) ) = 



fiyjJhz I A (s >,7r( s >) 

/(yi | A( s ),tt( s )) 

/( yi A| A(*>,7 t( s >) 

Z(yiA' I a( s ),tt( s )) 

/(yi I flfc, A ( s ),4 s) )pA 1 A (s) ,4 s) ) 

TJk'=\f(yi I 9 k . ,Tt$)p{9 k > | AW.TT^) 

/(yi | fl^A^)?^ 

TJk'= x f{yi I a^))^ ’ 



(15) 



where 7 is the /cth element of 7 r^ s ^. The subscript k' on the right side of Equation 15 is 
used in stuns over all possible values of the latent variable, whereas the subscript k denotes a 
specific value of the latent variable. The final result in Equation 15 for p(9 k \ yi, 
is an application of Bayes Theorem. Equation 15 is used for the E step computation. Note 
that Equation 15 applies for any item response model. 

Details of the EM algorithm for calculating estimates of the item parameters and 
latent variable distribution for item response models with dichotomous and polytomous 
items are presented in the following sections. 



Computing Item Parameter Estimates for Dichotomous Items 
This section presents details of the EM algorithm for computing item parameter es- 
timates for dichotomous item response models (where L 3 = 2 for all j). The two possible 
responses to each item are scored 0 (incorrect) and 1 (correct). 

For dichotomous items, Equation 13 can be written in terms of item response functions 
for each item. The item response function for item j is a function of a value of the latent 
variable and the item parameters associated with item j. The item response function gives 
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the probability of an examinee with latent variable value 9 answering item j correctly, 
and will be denoted P{9 , €j). The probability of an examinee with a latent variable value 
of 9 answering item j incorrectly is Q(9,6j) = 1 — P(9,6j). Examples of item response 
functions are normal and logistic ogives (Lord, 1980). 

For item j with item parameters 6j and randomly sampled examinee i with ability 
value 9k the probability of item response is given by 



fiva I 0k, Sj) = P(0k,S}) v “Q{0k,Sj) 1 - mi ■ (16) 



It is assumed that conditioned on the value of the latent variable for the examinee, the 
examinee’s responses to the items axe mutually independent (this is the assumption of 
local independence). Under local independence /(yi | 9k, A) can be written as 



/( Yi 1 0k, a) = n Siva 1 0k, Si) = YiPWk.Sir’Qiek,*,) 1 -™ . 

j=l i=l 



(17) 



Equation 13 can be written using Equation 17 as 



m N / p 71 "j N 

4>{ A) = jN log II /(»« I 6k ' 6 i) p( 6k I y i, A (s) ,7T (s) ) i 

k = 1 t=l ^ j = l * ' 



1 = 

m N n 

i aw, v-))} 

k=l i=l j=l 

m n f r N 

^VijViOk | yi,A (s) ,7T (s) ) 



t=l 



m n / 

= ££ log[P(0 fc ,^)] 

fc=i j = i 

m 7i 

+ ' 

’c=l 3 = 

m n r r N 



} 



771 71 s r iV -ix 

^2 12\ l °s[Q( e k,Sj)} i - yij)p( 0k I yi> a (s) , tt ( s) ) i 

k=lj=l^ 1 

771 71 s r N "I 

= ££ log[P(Qk,6j )] VijPiPk | r 

fc=lj = l^ U= 1 

771 Tl / r TV -jx 

+ ^p(0 fc | y if A (s) ,tt ( s) ) l 

k=l j = l ^ 1 J 

771 71 / r N "j 

- log W( tf *»^)l ^2yiM 6 k I Yi, A (s) , 7T (s) ) l . 

fc=l j=l ^ L i=1 , 



( 18 ) 
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A simpler computational formula for Equation 18 can be obtained by using Equation 15 
to compute 

4 S) = 'Em i * w .* w ) = E /(yi 1 6k ’ A< ’ >),r “ 



ni 



2 = 1 



H E”=i /( y. I Ok'.AMfrtf 



and 



r M _ y ',, „(0 | y . „■(»>) - Y ViS ^ Vi 1 A<a> ) ,r ^ ) 

Substituting Equations 19 and 20 into Equation 18 gives 

771 77 

^(a) = J2 Eflosim, «*)]>$ + iog[o(# t , < 3 )](4 S> - ^)} ■ 

fc=l j=l 



(19) 



( 20 ) 



( 21 ) 



The quantity n* can be thought of as a provisional estimate of the number of examinees 
in the sample with ability value Ok- The quantity can be thought of as a provisional 
estimate of the number of examinees in the sample with ability value Ok who answer item 
j correctly. Note the notational distinction that though n denotes the number of items 
on the test, the nj^ represent estimates of the number of examinees with specified ability 
value Ok at iteration s. 

The E step at iteration s consists of computing the values nj^ and rff using values 
of and 7 rj^ computed in iteration s — 1 (A^ 0 ) and 7 axe starting values used in 
iteration 0). In the M step at iteration s the values of and computed in the E step 
are substituted into Equation 21 and the value of A, namely A^ s+1 \ that maximizes 0(A) 
is found. Maximization methods such as Newton-Raphson (Dennis and Schnabel, 1983) 
involve computing the first and second partial derivatives of 0(A), which in turn involves 
computing the first and second derivatives of log[P(^,6j)] and log[Q(0fc, 6j)] with respect 
to A. These derivatives can be quite complex depending on the form of P(0k,Sj). Baker 
(1992) gives detailed derivations of these partial derivatives for various forms of P(0k, Sj )• 
The details of computing the maximum in the M step will not be presented in this paper. 



Computing an Estimate of the Latent Variable Distribution 
This section present details of the EM algorithm for computing an estimate of 7 r. 
The procedures presented in this section apply to any item response model for either 
dichotomous or polytomous items. 
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Equation 14 can be written as 



m N 



I y*» A (s) ,7r (s) )} 

k-1 t=l 
m N 

= X^ log ( 71 ’ k )^2p( dk I Yu A (5) ? 7T (5) ). (22) 

k= 1 i=l 

The E step substitutes the values of from Equation 19 into equation 22 to obtain 

m 

</>(*■) = E lo g(^)^ 5) • ( 23 ) 



^=1 



The m n k must s um to one because the TT k , k = 1, . . . , m, represent the probabilities 
for the discrete random variable 0. A Lagrange multiplier is used to maximize Equation 
22 subject to the constraint that the 7 sum to one. The function to maximize is 



A) = ^\0g(TT k )n[ S) + X ( - 1 

fc= 1 \fc = l 

m m 

= ^2 lo &M n k S) + A TTfc - A . 



(24) 



k = 1 



fc = l 



The partial derivatives of \&(7r, A) with respect to i axe 

dtf(7T,A) _ n£° 

o * A j 

OKk Kk 

for fc = 1, . . . , m. The partial derivative of 4/(7 t, A) with respect to A is 

d'&t'K, A) 



(25) 



d\ 






(26) 



fc= 1 



Setting Equation 25 equal to zero gives 



nl s) = -Xn k , 



for k = 1, . . . , m. S umming both sides of Equation 27 over k gives 

m m 

E n fc’ = = _A - 



(27) 



fc = l 



k - 1 



o 
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( 28 ) 
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because setting Equation 26 equal to zero implies that J2k ' Kk ~ Substituting the value 
for —A given by Equation 28 into Equation 27 gives 



n 



( s ) _ 



E (s) 

n k • 

k= 1 

(s+1) 



(29) 



Solving Equation 29 for -Kk gives values of n r k ; for iteration s of the EM algorithm. The 

estimates of 7r£‘ s-t ’ 1 ^ given by Equation 29 are 



7T 



(«+ 1) _ 



n (s) 

n k 



E m -,( s ) 

k '= l Ti fc' 

EL I ■*<*>) 

S?=.E£iP(My.>A<>>, •*<»>) 
= I yi, A (,l) , jr<*>) 



i= 1 
AT 



=ir 

A/' 



/(y, | A<»)4 s > 



Going from the second to the third line of Equation 30 follows from the fact that 

771 

| yi>A (s) ,?r (5) ) = 1. 

k '= 1 



(30) 



(31) 



The m values of 7r*. s+1 ^ are the new estimates of n Xk computed at iteration s. These values 
are used in the E step at iteration s + 1. The values of -Kk given by Equation 30 axe the 
same as those presented in Bock and Aitken (1981), Mislevy (1984), and Titterington, 
Smith, and Makov (1985). 

The values of 7 r^. s+1 ^ for the final iteration of the EM algorithm (when convergence is 
achieved) are not estimates of a posterior distribution for 0. Rather, they axe maximum 
likelihood estimates of the marginal distribution of the discrete random variable 0 defined 
over the population of examinees. 

After the final EM iteration the latent variable scale for most item response models 
can be set by linearly transforming the values of Ok so that the mean and variance of the 
latent variable distribution are equal to specified values. The item parameters would also 
need to be transformed to be on the same scale. 
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To summarize the general EM algorithm for dichotomous item response models, the 
E step at iteration s consists of computing the and the as given in Equations 19 
and 20, and the M step at iteration s consists of finding estimates A^ s+1 ^ that maximize 
Equation 21 and computing the 7 r^. s+1 ^ as given in Equation 30. 

Computing Item Parameter Estimates for Polytomous Items 

This section presents details of the EM algorithm for computing item parameter esti- 
mates for polytomous items. The Lj possible responses to item j are scored 0, 1 , . . . , Lj — 1. 

Equation 13 can be written in terms of item category response functions for each item. 
The item category response functions for item j are functions of the latent variable and 
the item parameters for item j. The item category response functions give the probability 
that an examinee with latent variable value 9 will respond in item response category l 
of item j. The item category response functions for item j will be denoted Pi(9,6j), 
l = 0, 1 , . . . , Lj — 1 , where Pi(9, 6j ) is the item category response function for the response 
category corresponding to item score l. 

For item j with item parameters 6j and randomly sampled examinee i with ability 
value 6k the probability of item response y^ is given by 

Li - 1 

f(y a I Sj) = ft ■ (32) 

1=0 



where I { y . j= iy,l = 0, 1, . . . , Lj — 1 is equal to 1 if yij = l and zero otherwise. 

Under local independence /(y* | 9k, A) can be written using Equation 32 as 

n n Lj — 1 

/( y f I e k ,A) = n Hva 1 = n n • 

j = 1 j = 1 1=0 

Equation 13 can be written using Equation 33 as 

m N s r n 1 x 

0( A ) = 51 1 log II /(»« I ° k ' 6 o) p( 9k I yi, A (s) ,tt ( s) ) l 

k= 1 i= 1 ^ U=1 -I J 



*3 = 

77i N n Lj — 1 



(33) 



= EEE E { lo sl- p l(0t,0j) I< ’" J “ ,, ]p(0t I yi, A (,, ,5r w )} 

k=l i= 1 j - 1 /=0 

m n Lj — 1 , piV -| n 

= EEE log[P/(0 fc ,^)] \^2l{y i:i =i}P(0k I yi, A (s) ,7T (s) ) l (34) 

k=lj=l 1=0 ^ L i=l - 1 J 
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Equation 34 can be written as 

m n Lj — 1 

0(A) =EE E log|a(#t, , (35) 

fc=l j=l 1=0 



where 



r <*> _ y'j „(0 | y . A <»> „(*>) = y~ Ilai^lZE I 



(36) 



i=l 



Note that the sum of rj kl over item response categories equals nj?': 



Li- 1 



E r $ = E El{««=.)P(ft IXo^'V*) 

i=0 i=0 i=l 

U - 1 



N 



-E 

z=l 

N 



p(6 k | y<, A (5 ) ,tt< 5) ) ^ I{y«=i> 



1=0 



"^p(6k | yi, A <s) , ir (,) ) = n^’ . 



(37) 



2=1 



The can be thought of as provisional estimates of the number of examinees in the 
sample with ability value 6 k who responded in category l of item j. The n k , as before, 
may be considered provisional estimates of the number of examinees with ability value 



0k, k = 

( 5 ) (s) 

The E step at iteration s consists of computing the values of rEj and n k using 
values of A^ and 7 computed in iteration s — 1 (A^ and 7rj^ axe starting values 
used in iteration 0). In the M step at iteration s the values of computed in the E 
step are substituted into Equation 30 to obtain the estimates of n k s \ and the values of 
rQ computed in the E step are substituted into Equation 35 and the value of A, namely 
A^ s+1 \ that maximizes 0(A) is found. As was the case for dichotomous items, this can 
involve computing the first and second partial derivatives of 0(A), which in turn involves 
computing the first and second derivatives of log[Pi(<?fc> d^)] with respect to A. For an 
example, see Muraki (1992) where details of the M step computation for the generalized 
partial credit model are presented. 
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For the case of two response categories for every item (Lj = 2 for all j) Equation 
35 is equivalent to Equation 21 with and r$ 0 = n[ s) - r$. Furthermore, 



n 



(s) _ J*) 






— r jk'o + r jki • 



Computing Bayes Modal Estimates using the EM Algorithm 
The EM algorithm can be modified to produce the posterior mode of (A, 7 t) (Demp- 
ster, Laird, and Rubin, 1977, pg. 6). If the log of the prior distribution of (A, 7 t) is 
G[(A, 7 r)] then to produce Bayesian modal estimates of A and 7 r (instead of maximum 
likelih ood estimates) Q[(A, 7 r) | (A^ s ^, 7r^)] + G[(A, 7r)] is maximized in the M step. Note 
that the E step calculation does not change. The M step calculation for 7 r does not change 
if a uniform prior is used for 7 r. Prior distributions for A and 7r as contained in G[(A, 7 r)] 
are not the same as the values A (0) and 7r (0) used to start the EM iterations. The values 
of A (0) and 7 r (0) are starting values, not prior distributions. Mislevy (1986) and Tsu- 
takawa and Lin (1986) discusses Bayes modal estimation using the EM algorithm for the 
3-parameter and 2-parameter logistic IRT models (see also Harwell and Baker, 1991). 



Marginal Maximum Likelihood Using the Bock-Aitken Algorithm 

This section discusses the relationship between the EM algorithm given above and 
the algorithm for marginal maximum likelihood given by Bock and Aitken (1981). Bock 
and Aitken (1981) start with a continuous latent variable and use a discrete version of the 
latent variable for computational purposes (numerical quadrature). Here, a discrete latent 
variable is specified in the model. 

A typical implementation of marginal maximum likelihood uses the marginal distri- 
bution of the observed variables as calculated using a specified distribution of the latent 
variable (although it is possible to estimate the distribution along with the item param- 
eters). Marginal maximum likelihood estimates of the item parameters are those that 
maximize the marginal likelihood of the observed variables. If | A,7r*) is the 

joint likelih ood of the observed and missing data for examinee i, and p(9k | 7rj*) specifies a 
mar ginal discrete distribution for the latent variable (with known probabilities 7r£), then 
the marginal likelihood for examinee i is 
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/(y» I A ) = ^2f(yi,6 k | a,tt*) 

k= 1 
m 

= I ^> a )p(^ I n k) 

k= 1 
m 

= £/(y<i«».A)*j. 

fc=l 



The log of the marginal likehhood for the whole sample is 



N N 

k*n/<* i = 53 iog ^( yi i 

2—1 2=1 

N ( m 



= 12 lo « i i ° k - A ) r t 



(38) 



(39) 



The log-likelihood in Equation 39 is the same as the observed data log-likelihood (the log 
of Equation 7) that is maximized by the EM algorithm with the exception that the 7r£ are 
treated as known values in Equation 39 whereas the 7 are parameters to be estimated 
in the EM algorithm presented previously. Thus, the EM algorithm could be used to 
maximize the log-likelihood in Equation 39 using initial values 7r^ = 7r£ and setting 
7 r[ s+1 ^ = 7 = 7r£ for all iterations. The M step for the parameters of the latent variable 
distribution would not be performed; rather the same values of jr^ would be used for 
every iteration. 

The description of the Bock-Aitken algorithm presented here follows that of Harwell, 
Baker, and Zwarts (1988). The ma ximum value of the log-likelihood in Equation 39 as, a 
function of A will occur at a value of A for which the derivative of Equation 39 is equal 
to zero. The maximum likehhood estimates of the item parameters are found by solving 
the system of equations given by setting the derivatives of Equation 39 with respect to 
the item parameters equal to zero. The derivative of the log- likelihood (Equation 39) with 
respect to S rj (the r-th item parameter for item j) is 
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v-. glog[/(yi 1 A)] _ i g/(yi 1 A) 

— ^ d6 r j 2^ /(yi I A) 3<5 rj 

_ ^ i a /(yi I [fc 

^ /(yi I A) ^ ^ 

TTfc d/(yj | flfc, A) 

" /(yi I A) ^ ab- 
using the equality 

d/(yi | Ok, A) _ d\og[f(yi \ Ok, A)] 

or ““ or | 0fc } A), 

UO r j C/U'f'j 



(40) 



(41) 



Equation 40 can be written as 

Y ^ lo s[/(yi 1 a)] _ y 

2— ^ r\F, ■ 2-^ 



l— l 



^log[/(yi 1 A)] _ /(yi | flfc, A)tt£ dlog[/(yj [ Ok, A)] 

/(yi I A) 



rj 



i— 1 

N 

= E 



E 

fc— 1 



d<5 rj - 



/(yi [ A)tt£ 



E 



^iog[/(yi I A)] 



dS r j 



^E fc= i/(yi I^,a)tt^ 

= i yi , a,^-) f; ^- og[/(y< 1 A)l 



i— 1 
m N 

= EE 

fc— 1 i— 1 



fc=l 

^iog[/(yi 1 A)] 

d6 r j 



d6, 



p(0 k | yi,A,7r*). 



(42) 



Equation 15 (with replaced by A) is used in going from the second to third line in 
Equation 42. 

The item parameter estimates that are computed in the M step of the EM algorithm 
are the solutions to a system of equations that results from setting the derivatives of 
Equation 13, with respect to the item parameters, equal to zero. The derivative of Equation 
13 with respect to 8 r j is 



m N 



EE 



dlog[/(yi I Ok, A)] 

d8 r j 



p(9k I yi, a w 




(43) 



Equations 43 and 42 are identical with the exception that in Equation 43 is used in 
place of A and is used in place of 7 r* in the function p. This distinction can have a 
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significant effect on the computational effort needed to solve the system of equations. In 
Equation 43 the item parameters that are being solved for only appear in the derivative 
(the first term in the product inside the sums), but not in the function p (where is 
treated as a constant). In Equation 42 the item parameters appear in both the derivative 
and the function p. It is usually the case that the derivative in Equation 43 will depend 
only on the item parameters for item j (and consequently Equation 43 will depend only 
on the item parameters for item j). Consequently, in the EM algorithm the parameter 
estimates for each item can be solved for separately. In contrast, Equation 42 will depend 
on the parameters for all items — not just the parameters for item j. The computationally 
simpler approach of separately solving for the parameters of each individual item cannot 
be used with Equation 42. The difference between using Equation 42 versus Equation 
43 for estimating the item parameters is illustrated by Tanner (1996, Section 4.1) for the 
two-parameter logistic item response model. 

Marginal maximum likelihood estimates that are solutions to the system of equations 
given by setting Equation 42 equal to zero for all item parameters have been presented for 
some specific item response models. Thissen (1982) presented marginal maximum likeli- 
hood estimates of the item parameters for the one-parameter logistic model for dichotomous 
items. Bock and Lieberman (1970) presented marginal maximum likelihood estimates for 
a two-parameter normal ogive model for dichotomous items, but their solution is only 
computationally practical for a small number of items. 

The computational complexity of estimates based on using Equation 42 led Bock and 
Aitken (1981) to suggest a two-step algorithm for computing marginal ma xim um likelihood 
estimates of the item parameters. Iteration s (s = 0, 1, . . .) of the Bock- Aitken algorithm 
consists of two steps. In the first step of the Bock-Aitken algorithm at iteration s item 
parameters computed at iteration s — 1 (A^ s \ where are starting values) are used 
to calculate the values of p(9k \ y*, A^, n*) using Equation 15. In the second step at 
iteration s item parameters that are the solution of 

EE 8108 ^ ' — A) W * I *■*<■>,«•) = o (44) 

fart fc 1 e6 -l 

are found (A^ s+1 ^). It is easier to solve for A in Equation 44 than in Equation 42 because 
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Equation 42 contains A as a part of the function p, whereas there is no A in the function 
p in Equation 44 ( A ^ in the function p in Equation 44 is known). The values of A^ s+1 ) 
fo un d in iteration s are used in iteration s + 1 to compute p(9k | yi> A^ 5+1 \ 7T*). This 
two-step process continues until the item parameters converge. 

The left side of Equation 44 is equal to Equation 43 with n* substituted for 7r (s) , so the 
item parameters that are the solution to Equation 44 are the same as the item parameters 
that maximiz e Equation 13 (with n* substituted for 7r (s) ). Consequently, the Bock-Aitken 
algorithm is identical to the EM algorithm where the parameters of the latent variable 
distribution, namely 7 r, are specified and not estimated. The first step of the Bock-Aitken 
algorithm corresponds to the E step of the EM algorithm, and the second step of the 
Bock-Aitken algorithm corresponds to the M step of the EM algorithm with the exception 
that 7r = 7r* is fixed and need not be estimated. 

This section has discussed marginal maximum likelihood estimates for the case where 
the latent variable distribution is assumed known. It is also possible to estimate the 
latent variable probabilities along with the item parameters. In this case the Bock-Aitken 
algorithm is the same as the EM algorithm described in previous sections (where both 
7 r and A were estimated). This last statement assumes that both methods start with 
the same 9k values (nodes in numerical quadrature), and the same 7 r^ 0) values (weights in 
numerical quadrature). 

Summary 

This paper presents detailed derivations of established results using a finite mixture 
approach. Estimates of parameters for item response models using the EM algorithm were 
derived treating the latent ability variable as discrete, in which case the distribution of 
observed item responses is a finite mixture. Maximum likelihood estimates of the item 
parameters and the latent variable distribution were obtained by a straightforward appli- 
cation of the general EM algorithm for finite mixtures. General results were presented for 
dichotomous item response models and for polytomous item response models. A closed- 
form solution for estimates of the latent ability distribution was given that applies to any 
item response model. Estimates for the item parameters will depend on the specific form 
of the item response functions, and will usually require iterative numerical procedures. 
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Finally, it was shown that the EM algorithm is the same as the Bock-Aitken algorithm for 
marginal ma ximum likelihood estimation of the item parameters. 

Discussion 

This paper focused on the case of a univariate real- valued latent variable. It is straight- 
forward to extend the estimation procedures presented for dichotomous and polytomous 
items to other cases of interest. One example is the case of latent class models where the 
discrete latent variable is nominal. Everitt (1984) and Bartholomew (1987) discuss using 
the EM algorithm to obtain maximum likelihood estimates for latent class models. Another 
example is the case of a multivariate latent variable. It is straightforward to generalize 
the formulas presented in this paper for a discrete univariate latent variable to a discrete 
multivariate latent variable although the larger number of categories for a multivariate 
discrete latent variable could greatly increase the amount of computation required. 

Besides the estimates of the item parameters and the discrete ability distribution that 
have been presented, an estimate of each examinee’s ability may also be of interest. An 
empirical Bayes approach to obtaining an estimate of the ability value for the ith examinee 
is to use the distribution of the latent ability variable given by Equation 15 with the values 
of and given by the final iteration of the EM algorithm (Tsutakawa and Soltys, 
1988; Bock and Aitken, 1981; Bernardo and Smith, 1994). Then the mean or mode of the 
distribution p(9k | y i, 7r^), k = 1, . . . ,m, can be used as an estimate of the ability 
of examinee i. These estimates are not true Bayesian estimates. A true Bayesian analysis 
would be based on the distribution p(9k \ y%) with the item parameters marginalized out 
(see Equation 5 in Tsutakawa and Soltys, 1988). Tsutakawa and Soltys (1988) present an 
approximation to a Bayesian solution for dichotomous item response models. 

In this paper the latent ability variable has been taken to be discrete in the model 
-specification. It may be more natural to specify a continuous distribution for the latent 
“"ability variable. For the Rasch model it has been shown that as long as enough levels 
of the latent ability variable are used ( [n + 2]/2 if n is even or [n + l]/2 if n is odd, 
where n is the number of items) then the class of models using a discrete latent ability 
variable is the same as the class of models using a continuous ability latent variable, and the 
maximum likelihood estimates of the item parameters using the EM algorithm described 
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in this paper are asymptotically identical to conditional maximum likelihood estimates 
of the item parameters (De Leeuw &; Verhelst, 1986; Fo l l m a nn , 1988; Lindsay, Clogg, & 
Grego, 1991). 

If an estimate of a continuous ability is needed, then various methods (e.g., kernel 
est imat ors) may be used to fit a continuous distribution to the final estimates of the rtk 
(Tapia and Thompson, 1978). If the latent ability variable is assumed to be continuous 
and in the E step of the general EM algorithm the rectangle rule is used to compute the 
integral of the latent ability variable, then that method is computationally similar to the 
procedure presented here that assumes a discrete latent ability variable with the proviso 
that the 9k values are equally spaced. 

It seems likely that as long as enough levels of a discrete latent variable are used 
not much, if anyt hing , will be lost by assuming a discrete rather than a continuous latent 
variable. When a real-valued discrete latent variable is used, one needs to decide on the 
number of levels to use and the values of the latent variable to use at each level (a similar 
decision would need to be made when assuming a continuous latent variable if numerical 
quadrature were employed). As noted above some theoretical results on the number of 
levels needed are available for the Rasch model. 

Results pert aining to the estimation of histograms for observed continuous random 
variables may have some value in obtaining a rough estimate of the number of levels for the 
latent ability variable. Terrell and Scott (1985) propose (2 N) 1/3 , or a convenient slightly 
larger integer, as the optimal number of bins to use in constructing a histogram from 
continuous data. Alternatively, goodness of fit tests for various values of m, the number 
of ability levels, could be used to select a value of m that best fits the data (Titterington, 
Smith, and Makov, 1985, pg. 150). Experience indicates that 20 levels of the latent 
variable are about the minimum number needed to give reasonable results for two and 
three parameter logistic models for dichotomous items. 
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